fix(duckdb): use actual DuckDB schema for read provider by ewgenius · Pull Request #650 · datafusion-contrib/datafusion-table-providers

ewgenius · 2026-05-20T00:19:26Z

Problem

DuckDBTableProviderFactory::create() uses cmd.schema (user-facing types) as the read provider schema. DuckDB silently downgrades certain types during table creation (e.g. Timestamp(ns, tz) → Timestamp(µs, tz) for TIMESTAMPTZ). The read provider then advertises the wrong schema while returning batches with actual DuckDB types, causing:

RowConverter column schema mismatch, expected Timestamp(Nanosecond, Some("UTC")) got Timestamp(Microsecond, Some("UTC"))

This crashes ORDER BY queries on partitioned DuckDB-accelerated datasets.

Fix

After table creation, query DuckDB for the actual storage schema via get_schema() and use that for the read provider. This matches the existing pattern in DuckDBTableFactory::table_provider().

Test

Added test_read_provider_schema_reflects_actual_duckdb_types which verifies the read provider advertises Timestamp(Microsecond, ...) when DuckDB downgrades from the requested Timestamp(Nanosecond, ...).

…d.schema DuckDB may silently change types during table creation (e.g. Timestamp(ns, tz) becomes Timestamp(µs, tz) for TIMESTAMPTZ). The read provider must advertise the actual storage types so downstream operators (SortExec, RowConverter) receive batches matching the advertised schema. Query DuckDB via get_schema() after table creation to obtain the true storage schema, consistent with DuckDBTableFactory::table_provider() which already does this correctly. Fixes RowConverter column schema mismatch on ORDER BY with partitioned DuckDB acceleration.

Address review feedback: read the actual DuckDB schema into the schema variable used by both TableDefinition (write path) and the read provider, rather than only fixing the read provider.

This reverts commit 040aa83.

* fix(duckdb): cast query_arrow results to projected_schema DuckDB's query_arrow ignored the projected_schema parameter, returning batches with DuckDB's native types (e.g. Timestamp(µs)) even when the caller expected different types (e.g. Timestamp(ns)). This caused schema mismatches for downstream operators pushed below SchemaCastScanExec. Cast result batches to projected_schema in the output stream when types differ. Add shared cast_batch_to_schema utility in util/arrow.rs for reuse by other Arrow-native connectors (ADBC, ODBC). * Revert "fix(duckdb): use actual DuckDB schema for read provider (#650)" This reverts commit 040aa83.

ewgenius self-assigned this May 20, 2026

ewgenius marked this pull request as draft May 20, 2026 00:23

ewgenius added the bug Something isn't working label May 20, 2026

ewgenius requested a review from phillipleblanc May 20, 2026 00:23

ewgenius force-pushed the evgenii/0520/duckdb-read-provider-use-actual-schema branch from 2bb1bce to 4ad04d8 Compare May 20, 2026 00:30

ewgenius marked this pull request as ready for review May 20, 2026 00:30

phillipleblanc reviewed May 20, 2026

View reviewed changes

Comment thread core/src/duckdb.rs

ewgenius force-pushed the evgenii/0520/duckdb-read-provider-use-actual-schema branch 2 times, most recently from 640ef30 to 4ad04d8 Compare May 20, 2026 01:12

fix: use actual DuckDB schema for TableDefinition as well

38c47b6

Address review feedback: read the actual DuckDB schema into the schema variable used by both TableDefinition (write path) and the read provider, rather than only fixing the read provider.

phillipleblanc reviewed May 20, 2026

View reviewed changes

Comment thread core/src/duckdb.rs

fix formatting

146e801

phillipleblanc approved these changes May 20, 2026

View reviewed changes

ewgenius merged commit 040aa83 into spiceai-52 May 20, 2026
12 checks passed

ewgenius deleted the evgenii/0520/duckdb-read-provider-use-actual-schema branch May 20, 2026 02:31

This was referenced May 20, 2026

fix: ORDER BY fails on partitioned DuckDB-accelerated tables with TIMESTAMPTZ columns spiceai/spiceai#10947

Merged

fix(duckdb): cast query_arrow results to projected_schema #652

Merged

ewgenius added a commit that referenced this pull request May 21, 2026

Revert "fix(duckdb): use actual DuckDB schema for read provider (#650)"

21d6f81

This reverts commit 040aa83.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(duckdb): use actual DuckDB schema for read provider#650

fix(duckdb): use actual DuckDB schema for read provider#650
ewgenius merged 3 commits into
spiceai-52from
evgenii/0520/duckdb-read-provider-use-actual-schema

ewgenius commented May 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ewgenius commented May 20, 2026

Problem

Fix

Test

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants